GBE 



Extensive Gene Acquisition in the Extremely Psychrophilic 
Bacterial Species Psychroflexus torquis and the Link to 
Sea-Ice Ecosystem Specialism 

Shi Feng\ Shane M. Powell\ Richard Wilson^, and John P. Bowman^"^ 

^Food Safety Centre, Tasmanian Institute of Agriculture, University of Tasmania, Australia 
^Central Science Laboratory, University of Tasmania, Australia 
^Corresponding author: E-mail: john.bowman@utas.edu.au. 
Accepted: December 20, 2013 

Data deposition: This project has been deposited at NCBI database under the accession CP003879 (Psychroflexus torquis) and APLFOOOOOOOO 
{Psychroflexus gondwanensis). 

Abstract 

Sea ice is a highly dynamic and productive environnnent that includes a diverse array of psychrophilic prokaryotic and eukaryotic taxa 
distinct fronn the underlying water colunnn. Because sea ice has only been extensive on Earth since the mid-Eocene, it has been 
hypothesized that bacteria highly adapted to inhabit sea ice have traits that have been acquired through horizontal gene transfer 
(HGT). Here we compared the genomes of the psychrophilic bacterium Psychroflexus torquis ATCC 700755^ associated with both 
Antarctic and Arctic sea ice, and its closely related nonpsychrophilic sister species, P. gondwanensis ACAM 44^. Results show that 
HGT has occurred much more extensively in P. torquis in comparison to P. gondwanensis. Genetic features that can be linked to the 
psychrophilic and sea ice-specific lifestyle of P. torquis include genes for exopolysaccharide (EPS) and polyunsaturated fatty 
acid (PUFA) biosynthesis, numerous specific modes of nutrient acquisition, and proteins putatively associated with ice-binding, 
light-sensing (bacteriophytochromes), and programmed cell death (metacaspases). Proteomic analysis showed that several genes 
associated with these traits are highly translated, especially those involved with EPS and PUFA production. Because most of the genes 
relating to the ability of P. torquis to dwell in sea-ice ecosystems occur on genomic islands that are absent in closely related 
P. gondwanensis, its adaptation to the sea-ice environment appears driven mainly by HGT. The genomic islands are rich in pseudo- 
genes, insertional elements, and addiction modules, suggesting that gene acquisition is being followed by a process of genome 
reduction potentially indicative of evolving ecosystem specialism. 
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Introduction 

Sea ice is a major feature of the surface of high-latitude 
oceans. It is relatively biologically productive due to extensive 
blooms of sea-ice algae, embedded in the ice floes as a band 
of growth or associated with the basal section of the ice floe 
that contacts the underlying seawater. Sea-ice algae and their 
epiphytic bacteria form the foundation of an active microbial 
loop comprised taxa distinct from the underlying seawater 
(Bowman et al. 1997; Brown and Bowman 2001; 
Brinkmeyer et al. 2003; Bowman et al. 2012). Tightly coupled 
to algal-driven primary production, sympagic bacterial popu- 
lations increase 1-2 weeks after the phytoplankton bloom 
peaks in late summer and become increasingly dominant 



when solar irradiance levels decline and algae subsequently 
senesce and/or become dormant (Kottmeier et al. 1987; 
Kottmeier and Sullivan 1987; Grossman and Dieckmann 
1994; McMinn and Martin 2013). As sea ice forms, brine is 
ejected from the ice crystal matrix and collects in cracks and 
channels (referred to as brine channels), making up 5-20% of 
the sea-ice volume. Though a very cold and saline environ- 
ment, sea-ice microbial communities (SIMCO) thrive in sea-ice 
brines at temperatures of -10°C and at salinities three or 
more times the concentration of seawater (Thomas and 
Dieckmann 2002). 

The extent of most sea ice changes by more than an order 
of magnitude between summer and winter. Long term, stable 
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ice tends to only occur connected to land at high latitudes. 
Recent trends in Arctic Ocean sea ice decline and a simulta- 
neous increase in Antarctic sea ice suggests that climate 
change may have an impact on sea ice-associated taxa, 
though the extent of this impact is far difficult to predict 
(Berge et al. 2012). Polar sea ice is geologically modern. 
Based on detection of sea-ice diatom fossils and geological 
signatures indicating iceberg rafting, ice formation in high lat- 
itude oceans has been extensive since the mid-late Eocene 
-35^7 Ma. However, multiyear ice that would act as a 
more stable sympagic platform than seasonal sea ice may 
not have appeared until as late as the Pliocene or 
Pleistocene (2.5-3 Ma) with the advent of sustained polar gla- 
ciation (Polyak 2010). In that time, microbial life associated 
with sea ice may have had the opportunity to specialize. 
Currently, our knowledge of the diversity of microbial sea- 
ice communities and their obligate sympagy remains limited 
(Bluhm et al. 201 1 ; Poulin et al. 201 1). 

The genome sequence of the psychrophilic marine species 
Colwellia psych rerythraea (strain 34H) provided the first 
genome-based perspective on the traits that allow not only 
for psychrophilic growth but also the possible means to grow 
and persist in sympagic ecosystems (Methe et al. 2005). The 
main traits examined included amino acid composition of pro- 
teins and their relation to tertiary structure, secreted and 
nonsecreted cold-active enzymes, omega-3 polyunsaturated 
fatty acids (PUFA), compatible solute synthesis, and secreted 
exopolysaccharides. Ice-active proteins that act to modify ice 
crystal structure have also been studied (Raymond et al. 2007; 
Bayer-Giraldi et al. 2010). The important sea ice-dwelling 
diatom Fragilariopsis cylindrus has several ice-active proteins 
orthologous to generally uncharacterized proteins in cold- 
adapted bacteria, suggesting that interdomain horizontal 
gene transfer (HGT) of these proteins may have occurred 
(Bayer-Giraldi et al. 201 0). This raises the question of whether 
other traits allowing inhabitation and successful competition 
in sea ice have also been acquired by HGT processes. 

In this study, we investigated the genomic properties of 
the extremely psychrophilic bacterial species P. torquis, an 
unusual member of the family Flavobacteriaceae (phylum 
Bacteroidetes) that has several traits linked to sea-ice inhabi- 
tation and dependence on algae via epiphytism. Psychroflexus 
torquis was originally isolated from algal assemblages in 
Antarctic multiyear sea ice. It differs from all other related 
species, including its closest relative P. gondwanensis, in 
being filamentous at an early stage of growth, extremely psy- 
chrophilic, able to synthesize omega-3 and omega-6 PUFA, 
and prolifically secreting soluble exopolysaccharides (EPS) 
(Bowman et al. 1998). The species, though chemoorgano- 
trophic, can also harness energy from light via proteorhodop- 
sin-driven proton pumping, a feature enhanced under 
osmotic stress (Feng et al. 2013). The genus Psychroflexus is 
found within moderately hypersaline ecosystems across the 
world; however, the combined traits of psychrophily and 



PUFA synthesis in P. torquis make this species stand out 
among other members of phylum Bacteroidetes. To explore 
these and other ecologically relevant genomic aspects of P. 
torquis that may provide insight into the relatively recent evo- 
lution of psychrophily, we compared the genome of the type 
strain ATCC 700755"^ to that of its closest relative 
P. gondwanensis ACAM 44^. To better discern important 
genes, we also performed comprehensive proteomics on 
ATCC 700755^ grown under a range of conditions. In partic- 
ular, we searched for mobile genetic elements and pseudo- 
genes as evidence for HGT and its possible role in sea-ice 
ecosystem specialism. 

Materials and Methods 

Genome Sequence Determination 

Psychroflexus torquis ATCC 700755^ (T = type strain) and 
P gondwanensis ACAM 44"^ (ATCC 51278"^) were cultivated 
on modified marine agar (0.5% w/v proteose peptone, 0.2% 
w/v yeast extract, 1 .5% w/v agar, and 3.5% w/v sea salts) at 4 
and 25 °C, respectively. High molecular weight DNA was ex- 
tracted and purified from biomass using the Marmur method. 
DNA was sequenced using the 454 GS-FLX/Plus (454 Life 
Sciences, Branford, CT.) platform following the manufac- 
turer's de novo sequencing protocol. For ATCC 700755^ 
and ACAM 44"^ 146.5 and 149.1 Mb of sequence data 
(430-440 bp average length) was assembled using Newbler 
V. 2.6 (454 Life Sciences). The Sanger sequence draft already 
available for ATCC 700755^ (generated through the Gordon 
and Betty Moore Foundation Genome Sequencing Project at 
the J. Craig Venter Institute) was compared with the pyrose- 
quenced contigs using Artemis (Carver et al. 2012) with the 
number of contigs reduced from 39 to 9. Gaps between con- 
tigs were closed using polymerase chain reaction (PCR) anal- 
ysis. Apparent misassemblies and regions with sequence 
inconsistencies were corrected in Artemis after PCR and se- 
quencing confirmation of the regions. 

Postsequence Analysis 

Gene annotation for the complete ATCC 700755^ sequence 
was carried out in Artemis and also compared with annota- 
tions generated via the Prodigal server (Hyatt et al. 2010) and 
Glimmer v. 3.02 (Delcher et al. 1999). Transfer RNAs were 
predicted using tRNAscan-SE (Lowe and Eddy 1997). 
Predicted CDSs were compared against the National Center 
for Biotechnology information (NCBI) database. Annotation 
utilized the NCBI Prokaryotic Genomes Automatic 
Annotation Pipeline. Pseudogenes in both genomes were de- 
fined as described by Lerat and Ochman (2005) based on 
comparisons with highly similar sequences in ATCC 
700755^ ACAM 44"^ and with highly similar orthologs in re- 
lated taxa. The presence of protein signal peptides and trans- 
membrane helices were predicted using the SignalP 4.1 
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(Petersen et al. 201 1) and THMM v. 2.0 servers (Centre for 
Biological Sequence Analysis, Technical University of 
Denmark), respectively. The genonne of ATCC 700755^ was 
visualized using DNAPIotter (Carver et al. 2009). CRISPR pal- 
indronnic repeats were detected using CRISPR recognition tool 
(CRT) (Bland et al. 2007). 

Protein Extraction and Posttreatment 

Psychroflexus torquis ATCC 700755^ was grown on modified 
marine agar at different sea salt salinities and light intensities 
at 4°C to examine the broadest possible set of proteins pro- 
duced by P. torquis. The different salinities were achieved by 
adding 17.5, 35, 52.5, and 70g/l of sea salt (Red Sea) to the 
marine agar and three levels of illumination (0, 3-4, 20- 
30|amol photon s~^ m~^) were used. Cells were lysed in 
1 ml50mM Tris-HCI buffer (pH 7.0) by sonication in an ice 
bath, with 1 0 s of sonication with a 1 0 s wait period, cycled 1 5 
times until the opaque cell suspension became translucent. 
The suspensions were then centrif uged at 1 6,000 x g for 
25min at 4°C. The supernatant protein was precipitated 
using trichloroacetic acid and the protein pellets were then 
treated with 0.2 M NaOH to improve subsequent solubiliza- 
tion (Nandakumar et al. 2003). The pellets were then solubi- 
lized using 1 00 |il 50 mM Tris-HCI buffer (pH 7.0), and protein 
concentration was determined using the Bradford assay (Bio- 
Rad). Volumes of samples containing 50 |ig of soluble protein 
extract were then reduced in a solution of 50 mM dithiothrei- 
tol, 1 00 mM ammonium bicarbonate for 1 h at room temper- 
ature. The samples were then alkylated with 200 mM 
iodoacetamide in lOOmM ammonium bicarbonate for 1 h 
at room temperature. After alkylation, the reduction of pro- 
teins was repeated, and they were digested in a buffer 
(50 mM ammonium bicarbonate, 1 mM calcium chloride) 
that contained sequencing grade modified trypsin 
(Promega), at a sample protein to trypsin ratio of 25:1, at 
37 °C with gentle shaking overnight. Digestion was stopped 
by acidification with 10 jil of 10% (v/v) formic acid. The sam- 
ples were then centrifuged for 5 min at 14,000 x g to remove 
any insoluble material, and an aliquot (1 00-200 jil) of peptides 
was transferred to high-performance liquid chromatography 
(HPLC) vials for mass spectrometric analysis. Samples were 
prepared with three biological replicates, and for each biolog- 
ical replicate, two technical replicates were performed. 

NanoLC-Orbitrap Tandem Mass Spectrometry 

The separation of peptides utilized a Surveyor Plus HPLC 
system fitted in line with an LTQ-Orbitrap XL mass spectro- 
meter (ThermoFisher Scientific). Aliquots of peptide samples 
were loaded at 0.05ml/min onto a C^s capillary trapping 
column (Peptide CapTrap, Michrom BioResources) controlled 
by an Alliance 2690 separations module (Waters). Peptides 
were then separated on an analytical HPLC column packed 
with 5-|am CI 8 media (PicoFrit Column, 15|am i.d. pulled tip. 



10 cm, New Objective) using four linear gradient segments 
controlled using a Surveyor MS Pump Plus (ThermoFisher 
Scientific) at 200nl/min The solvent series included initially 
5% acetonitrile in 0.2% formic acid (solvent A), shifting up 
to 90% acetonitrile in 0.2% formic acid (solvent B). The four- 
stage process where solvent B gradually replaced solvent A 
comprised: 0-10% solvent B over 7.5 min; 10-25% solvent B 
over 50 min, 25-55% B over 20 min, and then 55-100% 
solvent B over 5 min. This process was followed by reequili- 
bration of the column with solvent A for 15 min. The LTQ- 
Orbitrap XL was controlled using Xcalibur 2.0 software 
(ThermoFisher Scientific) and operated in a data-dependent 
acquisition mode whereby the survey scan was acquired in the 
Orbitrap with a resolving power set to 60,000 (at 400 m/z). 
MS/MS spectra were concurrently acquired in the LTQ mass 
analyser on the seven most intense ions from the Fourier 
Transform (FT) survey scan. Charge state filtering, where unas- 
signed and singly charged precursor ions were not selected for 
fragmentation, and dynamic exclusion (repeat count, 1; 
repeat duration, 30 s; exclusion list size, 500) were used. 
Fragmentation conditions in the LTQ were 35% normalized 
collision energy, activation q of 0.25, 30 ms activation time, 
and minimum ion selection intensity of 500 counts. 

Protein Identification and Bioinformatics 

The acquired MS/MS data were converted to .mzXML peak list 
files from .RAW files using the msConvert command in 
Proteowizard. The MS/MS data were searched against the 
proteome of P. torquis ATCC 700755^ (NCBI accession code 
CP003879) employing XITandem running in the open-source 
Computational Proteomics Analysis System environment 
(Rauch et al. 2006). For searches, parent ion tolerance of 
20ppm and fragment ion mass tolerance of 0.5 Da were 
used and enzyme cleavage was set to trypsin, allowing for a 
maximum of two missed cleavages. Amino acid residue alter- 
ations were also accounted for including S-carboxamido- 
methylation of cysteine residues specified as a fixed 
modification, and cyclization of N-terminal glutamine to pyr- 
oglutamic acid, deamidation of asparagine, hydroxylation of 
proline, and oxidation of methionine specified as variable 
modifications. Protein identifications were assessed by the 
Peptide Prophet and Protein Prophet algorithms (Nesvizhskii 
etal. 2003). Protein identifications were filtered by assigning a 
Protein Prophet probability >0.7. This filtration constrains the 
protein false discovery rate to <1%. Spectral counting was 
used to determine relative protein abundance, taking into ac- 
count the number of amino acid residues (Liu et al 2004). To 
acquire the maximum coverage of proteins, spectra obtained 
from all samples were pooled together in this study. Filtration 
removed 6.4% peptides with a total of 743,361 peptide spec- 
tra matched to 2,598 protein identification, with 1,936 pro- 
teins possessing two or more unique peptides. 
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Results and Discussion 

A Highly Conserved Core Genome Is Shared by P. torquis 
and P. gondwanensis 

Based on a meta-analysis of the latest metagenome and NCBI 
database records, P. torquis is restricted to sea ice or seawater 
around ice floes and has a bipolar distribution, occurring in 
both the Arctic and the Antarctic (fig. 1). Psychroflexus gond- 
wanensis, the closest cultivated relative of P. torquis, so far is 
only known to reside in Antarctic hypersaline lakes, where it 
can be a dominant member of the microbial community (Yau 
et al. 201 3). An equidistant lineage detected in the ice cover of 
Lake Vida, Antarctica (Mosier et al. 2007), and in salinated 
Yellow River delta soil from China, is so far uncultured but 
suggests a broader distribution of closely related Psyclnroflexus 
genotypes. The 4.32 Mb genome of the P. torquis type strain 
ATCC 700755^ was obtained in two stages, first as an 8x 
coverage Sanger-sequenced draft, and then closed in this 
study by a combination of pyrosequencing, gap filling, and 
subsequent PCR-based checks of potentially misassembled 



regions. The 3.32 Mb draft genome of P. gondwanensis 
type strain ACAM 44"^ (NCBI accession code APLFOOOOOOOO) 
was obtained at a coverage level of 44-fold. Details for the 
genomes are summarized in table 1 . 

The 16S rRNA gene sequence similarity between P. torquis 
and P. gondwanensis strains and clones averages 99.0%, 
higher than the empirical 98.5% similarity cut-off that has 
often been used as a preliminary evidence for defining distinct 
prokaryotic species (Stackebrandt et al. 2006). However, a full 
genome comparison between the two genomes yielded an 
average nucleotide similarity (AN!; Goris et al. 2007) of only 
68%, consistent with an overall low DNA:DNA hybridization 
level of <20% (Bowman et al. 1998). Despite the differences 
in size and AN! score, both strains share 2,308 genes out of an 
effective pan genome of 4,225 protein-coding genes. The 
genes that are in common have a high mean nucleotide sim- 
ilarity (92±4%) and extensive stretches of synteny (supple- 
mentary table SI, Supplementary Material online). 

A dissection of genes by functional classification demon- 
strates that the conserved gene overlap includes virtually all 



■ Psychroflexus halocasei LMG 25857^ (FR714910) [raclette semi-hard style cheese] 

MIC 1008 (JQ390292) [marine sample, Korea] 

Tat-08-015-51-157 (GU437622) [El Tatio Geyser Field sediment, Chile] 

Psychroflexus salinarum KCTC 22483^ (EU874390) [marine solar saltern, Korea] 

"Psychroflexus locisoisi" (AB381940) [Hunazoko Lake, Antarctica] 
_1 LAS B39N (AF513958) [Lake Laysan, Hawaii] 
LAI B21N (AF513959) [Lake Laysan, Hawaii] 

r~ SINH517 (HM128606) [hypersaline Lake,Tibetan Plateau, China] 
I_r SINI580 (M126952) [hypersaline Lake,Tibetan Plateau, China] 
SINI810 (HM127144) [hypersaline Lake,Tibetan Plateau, China] 

Tat-08-015-51-56 (GU437571) [El Tatio Geyser Field sediment, Chile] 

Tat-08-015-51-119 (GU437616) [El Tatio Geyser Field sediment, Chile] 



EG26 (AM691089)[hypersaline spring, Canada] 

Psychroflexus tropicus ATCC BAA734^ (NR_028554) [Lake Laysan, Hawaii] 



SB27 (EU722652) [sediment, athalassohaline lagoon, Spain] 
Tirl9 (EU725598) [water, athalassohaline lagoon, Spain] 
Tirl7 (EU725597) [water, athalassohaline lagoon, Spain] 
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" Psychroflexus sediminis KCTC 22166^ (NR_044410) [saline lake sediment, Tibet, China] 

SINH970 (HM128393) [hypersaline Lake,Tibetan Plateau, China] 

SINH432 (HM128002) [hypersaline Lake,Tibetan Plateau, China] 
SINH609 (HM128137) [hypersaline Lake,Tibetan Plateau, China] 
SINH1044 (HM 127889) [hypersaline Lake,Tibetan Plateau, China] 
SINH804 (HM128275) [hypersaline Lake,Tibetan Plateau, China] 
Tat 08-015-51-49 (GU437566) [El Tatio Geyser Field sediment, Chile] 
Psychroflexus gondwanensis ACAM 44^ (FR849911) [Organic Lake, Antarctica] 

— R 39078 (FR772275) [saline lake, Antarctica] 
ANTLV7 Gil (DQ521543) [Lake Vida ice cover, Antarctica] 

- BF99 C105 (DQ677858) [Yellow River delta soil, China] 

antwl407 (JF811035) [Antarctic sea-ice] 

r RCC2388 [Arctic Ocean seawater] 
1r BSi20642 (DQ007442) [Arctic sea-ice] 

RCC2407 (JX863358) [Arctic Ocean seawater] 
ANT9232b (AY167318) [Antarctic pack-ice] 
ANT9268 (AY167320) [Antarctic pack-ice] 
TIG (JQ753220) [Antarctic sea-ice] 
W135 (JQ753207) [Antarctic sea-ice] 

Psychroflexus torquis ATCC 700755^ (CP003879) [Antarctic sea-ice] 
ARK10255 (AF468429) [Arctic sea-ice] 
ARK10063 (AF468410) [Arctic sea-ice] 
gap d 25 (DQ530461) [Antarctic sea-ice] 
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■ Topt 



Fig. 1. — The 165 rRNA gene-based phylogenetic tree of the genus Psychroflexus (family Flavobacteriaceae, phylum Bacteroidetes). The tree was 
constructed with complete or near-complete sequences, aligned with ClustalW, clustered with the maximum likelihood algorithm, and created using 
Neighbor-Joining. Black circles indicate that bootstrapping support for nodes is >80%. The outgroup used was Capnocytophaga ochracea. The bar indicates 
relative sequence distance. NCBI accession codes are given in parentheses followed by location of isolation. For type strains of described species, cardinal 
temperatures for growth rate are indicated in the inset bar graph including values for temperature: minimum (Tmin), optimal (7"opt), and maximum (/"max) 
temperature values. Values were determined in liquid media, with the minimum temperature a theoretical value calculated from the square root model of 
Ratkowsky et al. (1983). 
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Table 1 

Genome Data for P. torquis ATCC 700755"^ and P. gondwanensis ACAM W 



Species 
Strain 



P. torquis 

ATCC 700755^ (= ACAM 623^) 



P. gondwanensis^ 

ACAM 44^ (=ATCC 521 78\ DSM 5423^ 



Taxonomic hierarchy 
Genome status 
Platform 
Coverage 

Number of replicons 

Extrachromosomal elements 

GenBank ID 

Genome size (bp) 

DNA coding region (bp) 

DNA G + C content (bp) 

Total genes 

RNA genes 

rRNA operons 

Protein coding genes 

Pseudogenes 

Genes with a predicted function 
Proteins with signal peptides or POR 

secretion system sorting domains 
Proteins with transmembrane domains 



Flavobacteriaceae, Flavobacteriales, 

Finished 

Sanger (JCVI), 454 GX FLX 

8x, 33x 

1 

0 

CP003879, NC_018721 
4,321,832 

3,503,343 (81.06%) 
34.51% (34.9/33.5) 
3,951 

45 (1.14%) 
3 

3,526 

379 (9.57%) 
2,278 (64.58%)^ 

380 (10.78%) 

761 (21.58%) 



Flavobacteria, and Bacteroidetes 
Noncontiguous draft 
454 GS FLX 
44 X (62 contigs) 
1 
0 

APLFOOOOOOOO 
3,325,075 

2,852,146 (85.77%) 
35.72% (36.0/34.4) 
3,007 

47 (1.61%) 
3 

2,912 

48 (1.60%) 
2,006 (68.32%) 
299 (10.18%) 

658 (22.41 %) 



^Estimates based on available noncontiguous sequence data. 



RNA-codlng genes and genes involved in fundamental cellular 
processes, including DNA-related processes, protein transla- 
tion, ribosonne structure and biogenesis, tRNA processing, 
protein secretion, cytokinesis, nucleic acid transport/nnetabo- 
lisnn, cofactor transport/metabolism, and gliding motility 
(fig. 2). Many of these genes have similarly colocated and 
syntenic arrangements among related taxa within the family 
Flavobacteriaceae, demonstrating an apparent ancestral 
nature. Other functional classes more moderately conserved 
between the two species include metabolism, protein modifi- 
cation, folding, and turnover, and defence/detoxification sys- 
tems. Overall, this pattern of gene sharing is consistent with 
both P. torquis and P. gondwanensis being inhabitants of cold 
saline ecosystems and having broadly similar though not iden- 
tical morphological and metabolic phenotypes (Bowman et al. 
1998). 

Evidence of Extensive HGT in the Species-Dependent 
Section of the P. torquis ATCC 700755"^ Genome 

Large tracts of the ATCC 700755"^ and ACAM 44^ genomes 
share no significant nucleotide similarity with 1,265 and 653 
strain distinct genes, respectively. The structural and functional 
categorization of the species-dependent genome regions 
reveal that in general, P. torquis ATCC 700755^ has a wide 
variety of functional genes in this set compared with P. gond- 
wanensis ACAM 44^ (fig. 2, supplementary tables SI and S2, 
Supplementary Material online). When directly compared 
against the ACAM 44^ genome (fig. 3), these regions occur 
in distinct genomic islands (GIs), of which 44 could be defined 
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■ ACAM 44 



0 200 400 600 
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Fig. 2. — Numbers of genes shared or strain dependent in P. torquis 
ATCC 700755^ and P gondwanensis ACAM 44"^ organized by functional 
class. 

(size range 3.7-1 53.2 kb, supplementary table SI, 
Supplementary Material online). Fourteen of these GIs are 
flanked by tRNA genes, and the majority are characterized 
by low gene density (average 74.5%) and are relatively rich 
in pseudogene, insertional element, and/or addiction mod- 
ules. The presence of tRNA genes flanking GIs is consistent 
with tRNA being a well-known hotspot for site-specific recom- 
bination (Ou et al. 2006). One tRNA gene (tRNA-Met between 
P700755_00961/00963) has been disrupted in ATCC 
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Fig. 3. — Genome map of P. torquis ATCC 7000755^ drawn using DNAPIotter (Carver et al. 2009). The first and second rings (blue) show gene 
annotations for the sense and antisense strands, respectively. The third ring (brown) shows the position of annotated pseudogenes. The fourth ring (green) 
shows genes that occur in P. torquis AJCC 700755^ but not P. gondwanensis ACAM 44^. On this ring (in red) are the positions of the three rRNA operons, 
which are located in syntenic regions of the ACAM 44^ genome. The fifth ring (purple) shows the position of tRNA coding regions. The sixth ring 
demonstrates GC-bias (black, positive; gray, negative) across the genome calculated from 1,000-bp segments. The inner ring shows GC-skew with the 
leading strand commencing at the predicted oriC 



700755^ by an integrase, suggesting other tRNAs may have 
also been lost by Gl insertions. An example of such may in- 
clude a tRNA-Ala-GGC present as a pair in ACAM 44"^ but only 
singly in ATCC 700755"^ (between P700755_01 366/01 368), 
located directly adjacent to a 1 53.2 kb Gl (Gl no. 1 7; supple- 
mentary table S1, Supplementary Material online). 

It is suggested that the variation in G C content can be an 
important evidence for HGT in regions of a genome (Ochman 
et al. 2000). Thus, we plotted the overall GC% of P. torquis 
using DNAMAN. The overall GC% of P. torquis is 34.5%, and 
we were able to locate several regions that deviated from this 
mean ranging between 26% and 50%, with 15 GIs located 
within these areas (supplementary fig. SI, Supplementary 
Material online) suggestive of an external origin and ancestry 
of these genomic regions. Furthermore, comparisons of Gl- 
associated genes with the NCBI database were made, and the 



highest match was noted on the basis of amino acid similarity. 
A high proportion of genes (27%) did not match anything on 
the NCBI database, while 45% of matches occurred with pro- 
teins from members of the family Flavobacteriaceae that 
mainly occur in marine, especially polar, ecosystems (i.e., 
Polaribacter spp., Cellulophaga algicola, Gillisia spp.). The re- 
mainder of matches were with a wide variety of bacteria 
within the phylum Bacteroidetes and in other phyla (supple- 
mentary table SI, Supplementary Material online). A similar 
pattern was observed for ACAM 44^, though more muted in 
terms of the sheer scale of gene acquisition. Sea ice has been 
proposed as a hotspot for genetic recombination due to its 
high density of bacteriophage, a result of the concentration of 
brines during sea-ice formation (Wells and Deming 2006). 
Despite no genes of definitive phage origin being found on 
the ATCC 700755^ genome, extensive phage defence 
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systenns were detected, including four large regions contain- 
ing 7-28 CRISPR repeats, with the nnain concentration located 
innnnediately after a cluster of clustered regularly interspaced 
short palindromic repeats (CRISPR) genes (P700755_00291 to 
00295). In addition, six restriction-nnodification systems and a 
variety of proteins in the phage infection resistance family 
were detected (supplementary table S1 , Supplementary Mate- 
rial online). Approximately 4% of genes were insertion 
sequence (IS) elements belonging to 40 families of retron-as- 
sociated reverse transcriptase, integrase, and transposase pro- 
teins. Ten families of addiction (toxin/antitoxin) modules were 
also present. These gene types and their distribution suggest 
that mobile genetic elements and, potentially, phages were 
involved in building Gl-associated genomic content. Overall, 
the data suggest that there has been a high level of gene 
acquisition in P. torquis ATCC 700755^. Based on the collec- 
tive nature of the GIs as explained above, we hypothesize that 
HGT processes drove this acquisition. Because direct evidence 
such as prophage and conjugative systems are no longer ev- 
ident on the genome it would be ideal to collect data from a 
wider range of strains as well as other sea-ice-derived bacteria 
to assess the degree of HGT and whether there is a pool of 
shared genetic homologs within SIMCO. 

Modern Evolution of Psychrophily in P. torquis ATCC 
700755^ and Its Relation to Genome Sequence-Derived 
Criteria 

Psychrophilic prokaryotes unlike thermophilic taxa are neither 
phylogenetically deep-branching nor tend to cluster together. 
In terms of phylogenetic relationships, psychrophilic strains 
tend to occur almost without exception in genera with 
higher temperature adapted relatives and are usually located 
at the tips of branches. This aspect of psychrophilic prokaryote 
phylogeny was first noted by Franzmann (Fanzmann 1996). 
Though it is conceivable that various cold-adapted microbes 
evolved during earlier cold times on Earth, for example during 
the purported Cryogenian period (MacDonald et al. 201 0) and 
Snowball Earth periods (Hoffman and Schrag 2002), psy- 
chrophily amongst prokaryotes is a modern phenomenon 
based on the temperature history of Earth and underlying 
prokaryotic evolution (Schwartzman and Lineweaver 2005). 
As seen in figure 1 the deeper branching Psychroflexus species 
are classic mesophiles and derive from widespread locations, 
such as geyser field soils, marine sites, saline lakes, and even 
cheese. The tip position of P. torquis in the Psycliroflexus phy- 
logenetic tree (fig. 1) and its apparent restriction to polar 
marine locations, specifically multiyear sea ice, fits the concept 
of a modern evolution of psychrophily. The cardinal growth 
temperatures of P. gondwanensis and P. torquis differ by more 
than 10°C and this value is even greater for other 
Psycliroflexus spp (fig. 1). 

Typically amino acid composition in general terms is cor- 
related to growth temperature preference, but in the case 



of the strains studied here overall amino acid content did 
not vary significantly. Specific amino acids He, Val, Tyr, Trp, 
Arg, Glu, Leu (IVYWREL) have been found to be strong 
determinants of thermostability (Zeldovich et al. 2007); how- 
ever, predictions based on the levels of these amino acids 
for ATCC 700755"^ and ACAM 44"^ overestimated their op- 
timal growth temperatures as 29 and 34 °C, respectively. 
This result is not surprising in that predictions of thermoa- 
daptation are relatively insensitive at the mesophilic/psychro- 
philic end of the biokinetic spectrum, suggesting an inherent 
limit to thermostability of proteins and thus growth (Corkrey 
et al. 2012). Codon usage analysis was also performed to 
determine whether any significant trends occurred between 
the strains based on highly translated gene products. This 
analysis was done by comparing the top ten percentile of 
the most abundant proteins of ATCC 700755"^ that had 
highly similar orthologs in ACAM 44"^ (n = 306) 
determined from the protein spectral count data set (sup- 
plementary table SI, Supplementary Material online). 
Analysis of the expected codon adaptation index (Puigbo 
et al. 2008) revealed significant bias, mainly due to differ- 
ences in synonymous third-base positions being more AT- 
rich (average % GC3: 27 vs. 35). Overall, this suggests that 
amino acid content and codon usage criteria can discrimi- 
nate between the two species examined; however, the 
trends may not necessarily relate only to psychrophily. A 
similar situation was found with the extreme psychrophile 
Psychromonas ingrahamii, which did not exhibit unusual 
codon usage trends (Riley et al. 2008) nor was its level of 
IVYWREL amino acid content informative. Another explana- 
tion for temperature sensitivity is that P. torquis is inherently 
unstable at mesophilic temperatures and that, because pro- 
teins do not seem obviously involved, it is possible that an- 
other part of the cell is the temperature "Achilles heel." A 
logical candidate is the cell membrane of ATCC 700755^ 
which is quite different from that of ACAM 44^ as it is rich 
in PUFA and anteiso-branched fatty acids (Bowman et al. 
1998). This combination creates membrane fluidity compat- 
ible with very low temperature even though it has the dis- 
advantage in being potentially more thermolabile (D'Aoust 
and Kushner 1971). 

Evidence of Significant Gene Decay in the P. torquis 
ATCC 700755^ Genome 

The genome of ATCC 700755^ has an unusually large 
number of pseudogenes (n = 379), making up 9.5% of 
the total number of protein coding genes. This percentage 
represents a conservative 8-fold increase over that of the 
ACAM 44^ genome for which pseudogene numbers have 
likely been overestimated due to its draft status. The appear- 
ance of pseudogenes is believed to be associated with 
recent mutational processes because they seem to be rapidly 
deleted from genomes (Kuo and Ochman 2010). In ATCC 
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700755^ the pseudogenes primarily occurred as truncated 
fragnnents or segmented open reading frames (ORFs) due to 
one or more nonsense mutations and/or indels, while more 
rarely, pseudogenes were derived from direct transposon or 
retron disruptions. In the case of the more overtly truncated 
ORFs, most have been affected by subsequent frameshifts 
and partial deletion because the coding region remnants 
were usually less than 40% of the full length version, 
with the remaining degenerate region sometimes still adja- 
cent to the pseudogene. Indeed, numerous examples of 
fragmentary relics lacking both stop and start codons 
were detectable in intergenic regions. Insertional (IN) ele- 
ment types are very diverse in P. torquis ATCC 700755^, 
with a high proportion being pseudogenes (supplementary 
table S3, Supplementary Material online). Given that many 
IS elements and other mobile genetic elements are concen- 
trated in GIs, insertion and recombination appears to have 
shaped the genome of ATCC 700755^ extensively. Such 
high proportions of pseudogenes essentially indicate a pro- 
cess of both gene decay and adaptation, as has been ob- 
served in bacteria transiting to a lifestyle of obligate 
parasitism or symbiotism (Burke and Moran 2011). 

The large number of pseudogenes in ATCC 700755^ rela- 
tive to ACAM 44^ suggest that HGT gene acquisition may 
have been both advantageous and deleterious. Unnecessary 
genes, copies of genes involved in the HGT processes them- 
selves, as well as those accidentally disrupted via integration 
events have been and are being progressively deleted. This 
process may be due to selection against pseudogenes them- 
selves or selection of processes that actually remove them 
from genomes (Kuo and Ochman 2010). Other strains will 
need to be examined to determine whether this pseudogene 
decay is consistent at a species level and if the "burden" of 
pseudogenes correlates to fitness. The predicted location 
of the point of origin of replication (or/'O, detected using 
Doric V. 5.0 (Gao et al. 2012), is of interest in this regard. 
Surprisingly, two putative or/C sites were found, both located 
near each other within a 31.2 kb Gl (no. 22) immediately 
adjacent to retron-type reverse transcriptases (between 
P700755_01 930/01 931; P700755_01 949/01 953), the latter 
of which is a pseudogene. 

Pseudogenes are generally assumed not to be expressed or 
translated, although exceptions have been detected (Rusk 
201 1). Based on our proteomic spectral count data, the vast 
majority of pseudogenes were not detected after filtration 
(supplementary table S3, Supplementary Material online). 
However, some pseudogenes have a substantial number of 
collated spectral counts that had high confidence of identifi- 
cation. In all cases, these peptides occurred on IS elements 
that occurred multiple times in the genome either as full 
length genes or truncated pseudogene versions. It is possible 
that truncated protein products are still translated but in gen- 
eral appear to have low cellular abundance based on the 
spectral count data. 



Functional Prioritization Suggested by Abundance of 
Protein Products in P. torquis ATCC 700755^ Genome 

We assessed to what degree the genes of ATCC 700755^ are 
translated using proteomic analysis. It is assumed that the 
more abundant proteins are inherently fundamental to the 
system biology of ATCC 700755^. Above an arbitrary thresh- 
old set at 0.005% of the filtered and normalized total spectral 
count, proteins were regarded as being abundant. At this 
threshold, spectral counts for multiple peptides were observed 
in most sample replicates, thus suggesting sustained produc- 
tion under the range of growth conditions tested. The prote- 
omic data set is, however, logistically limited due to huge 
dynamic ranges, with a natural bias against low abundance 
and transmembrane proteins (Borg et al. 2011). Many pro- 
teins belonging to the overlapping proteomes of ATCC 
700755^ and ACAM 44^ were readily detected, as expected, 
as were high proportions of functionally conserved, mainly 
cytosolic proteins (fig. 4). Among these, the least detected 
proteins were associated with DNA-related processes, includ- 
ing repair and recombination (only 20% of proteins observed), 
while the highest proportion (63%) was associated with elec- 
tron transport. This difference may indicate the prioritization 
of proteins in terms of cellular processes, where certain func- 
tional proteins such as DNA repair are only upregulated to 
high levels when needed. Other proteins, such as those in- 
volved in central metabolism and energy production, are con- 
stantly required by the organism. At the other extreme, some 
functional groups of proteins from the ATCC 700755^ 
genome were rarely detected, including addiction modules, 
foreign defence proteins, IN elements, and a high proportion 
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Fig. 4. — Distribution of most abundant proteins of P. torquis ATCC 
700755^ and whether they are strain-specific or shared with the genome 
of P gondwanensis ACAM 44^ grouped by functional classes. Proteomic 
data were pooled from all treatment samples, an abundant protein was 
defined as 0.005% of the total spectral counts, and each was detected in 
most replicates and represented by multiple peptide sequences. 
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of proteins with seenningly no function at all based on their 
lack of conserved donnains. The weak translation of these 
latter proteins suggests that their presence could be largely 
strain dependent (Ou et al. 2005). 

Genes That Code Abundant Proteins in P. torquis ATCC 
700755 Are Potentially Important for Sea-Ice Inhabitation 

We assunne that species and/or strain-specific features (as 
sunnnnarized in fig. 2 in ternns of gene distribution) greatly 
contribute to the inherent uniqueness of bacterial strains at 
a phenotypic level and subsequently determine their ecologi- 
cal nature. Features that are strongly expressed likely have 
important roles in defining this identity. Based on protein 
abundance analysis, the most prominent products of the 
unique genome regions of P. torquis ATCC 700755^ (fig. 4) 
are proteins associated with cell envelope biogenesis, cell sur- 
face proteins/adhesions, proteins involved in the transport and 
metabolism of carbohydrates, lipid and inorganic ion transport 
and metabolism, and those so far with only generalized func- 
tions. Many strain-dependent features are likely not well ob- 
served due to the inability of laboratory-based growth 
conditions to adequately capture this information. Given 
that sea-ice ecosystem conditions are highly complex in 
terms of resource availability, physicochemical pressures, and 
biological interactions, many other more transiently expressed 
proteins potentially play important roles. Nevertheless, these 
experiments provide a preliminary view of the functionality of 
ATCC 700755^ in the context of the ecosystem to which it is 
specialized. 

A list of proteins found to be relatively abundant (as defined 
above) and produced by genes only on GIs (supplementary 
table S1, Supplementary Material online) summarize as far as 
possible the potential unique biology of ATCC 700755^ and 
provide a suite of relevant functions for its sea-ice ecosystem 
associations. In particular, two GIs (no. 1 8 and 43) have a large 
number of moderately to highly abundant proteins while sev- 
eral GIs include no abundant proteins at all. The selective pres- 
sure enforced in a sea-ice ecosystem may lead to the retention 
of some GIs as stable sections of the genome, while others 
could be eventually lost. 

The EPS production by ATCC 700755^ which includes mul- 
tiple sulphated, uronic acid-containing, and N-acetylated poly- 
saccharides, is prodigious substantially increasing medium 
viscosity, with production levels and viscosity increasing with 
decreasing growth temperature (Mancuso Nichols 2005; 
Bowman 2008). Gl no. 18 includes a 60-kb EPS biosynthesis 
locus (fig. 5). Though the exact structure of the EPS and its 
specific functional benefits remain to be elucidated, EPS pro- 
duction has been found to be a common feature of sea-ice 
bacteria and a crucial factor for sea-ice inhabitation. It provides 
cryoprotection (Marx et al. 2009), encourages ice crystal mod- 
ification, retains liquid brine in brine channels, thereby en- 
hancing recruitment into forming sea ice (Ewert and 



Deming 201 1 ; Krembs et al. 201 1 ). It also has a role in nutrient 
acquisition, especially trace elements such as iron and cobalt, 
because anionic EPS can act as powerful ligands (Hassler et al. 
2011). The highly translated level of EPS biosynthesis gene 
products (fig. 5) suggests that EPS may be crucial for the 
low temperature growth and activity of ATCC 700755^ 
even outside of the sea-ice environment. The EPS cluster con- 
tains genes that match those of bacterial relatives within and 
outside of the phylum Bacteroidetes. Intermingled with these 
genes are intact and remnant transposase genes, as well as an 
MazEF family addiction module and duplicated genes coding 
putative UDP-glucose 6-dehydrogenases, suggesting recent 
acquisition via HGT (fig. 5). 

The second region of highly translated proteins is located 
on Gl no. 43 and includes a 42-kb cluster of transporter-like 
proteins mostly of the protein family referred to as acidobac- 
terial duplicated orphan permeases (ADOP; P700755_03736 
to 03759) first observed in the genomes of acidobacteria 
(Ward et al. 2009). The ADOP proteins include a cluster of 
ten paralogs associated with other transporter proteins (fig. 6). 
The ADOPs all contain a MacB-periplasmic domain, suggest- 
ing they could be involved in efflux. The relatively high protein 
abundance and clustered nature of numerous ADOP paralogs 
and surrounding transporters (fig. 6) is intriguing and suggests 
that they have an important role in the biology of P. torquis 
ATCC 700755^. Whether this role is for export of toxic prod- 
ucts of metabolism or deliberate release of products that can 
influence its interactions with other bacteria or algae is 
unknown. 

The genes coding the PUFA biosynthesis (pfa) cluster 
(P700755_01456 to _01462) (fig. 7) are located on a third 
Gl (no. 17). The ability to synthesize omega-3 and omega-6- 
type PUFA is a rare trait among bacteria; it is restricted to class 
Gammaproteobacteria and phylum Bacteroidetes and, within 
those groups, is largely restricted to marine psychrophiles 
(Shulse and Allen 201 1). The pfa cluster of ATCC 700755"^ is 
similar to previously described clusters but has an altered struc- 
tural arrangement of conserved domains compared with 
those found in Gammaproteobacteria, suggesting a different 
evolutionary process of acquisition. ATCC 700755^ which can 
synthesize eicosapentaenoic acid (EPA) via this cluster, has 
higher levels of EPA in its cytoplasmic membrane at low tem- 
peratures (Nichols et al. 1997). Protein abundances were 
found to be substantial for pfa cluster gene products 
(fig. 7). Unlike Gammaproteobacteria, which form either 
EPA or docosahexaenoic acid, ATCC 700755^ also forms ara- 
chidonic acid, though its levels do not increase with temper- 
ature (Nichols et al. 1997) so it may have another role. We 
assume arachidonic acid is another by-product from the pfa 
cluster. PUFA is capable of maintaining homeoviscosity of 
membranes at very low temperatures (Russel and Nichols 
1999; Usui et al. 2012) and, due to increasing cell hydropho- 
bicity, also potentially shields cells against hydrophilic toxic 
substances such as peroxides (Nishida et al. 2010). Two 
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Fig. 5. — The Gl region 18 of P. torquis ATCC 700755^ containing a large EPS biosynthesis cluster that is highly translated and likely involved in the 
organism's manufacture of complex EPS. Relative abundance of gene products for this region is shown in the lower graph. Gray genes denote pseudogenes. 
Black genes denote intact transposases. Green genes are hypothetical proteins with conserved domain regions. Blue genes are enzymes involved in poly- 
saccharide biosynthesis including synthesis of modified sugars, glycosylation, polymerases, and flippases; P700755_001654 and 1656 are near identical 
copies of putative UDP-glucose 6-dehydrogenase genes. Yellow genes are associated with lipid metabolism and include two FabH homologs. A dark red gene 
denotes a NusG-like transcriptional elongation/antitermination factor. Pink genes separated by large intergenic regions include putative exported proteases. 
Purple genes include an MazEF family addiction module. Further details are shown in supplementary tables SI and S3, Supplementary Material online. 



genome-sequenced relatives of P. torquis, strain SCB49 (re- 
lated to the genus Ulvibacter) isolated from the Arctic Ocean 
and a strain of the genus Dokdonia isolated from Arctic Ocean 
marine sediment (classified as Krokinobacter sp. 4H-3-7-5) 
possess pfa gene clusters very similar to that of P. torquis. 
This suggests that this type of pfa gene cluster could be 
prevalent in other psychrophilic members of the family 
Flavobacteriaceae. The pfa cluster in ATCC 700755^ is flanked 
by a number of intact and remnant transposases and a DNA- 
binding excisionase is located immediately upstream, suggest- 
ing that the pfa cluster may have been mobilized into the 
ATCC 700755^ genome via a phage insertion or conjugative 
transposon. PUFA production is also very sensitive to lipid ox- 
idation (Imlay 2003). The sea-ice environment is generally sat- 
urated with oxygen due to photosynthetic activity and low 
temperature (D'Amico et al. 2006), which may partly explain 
why ATCC 700755^ possesses a wide array of enzymes that 
provide immediate protection against reactive oxygen species 
and organic peroxides. The array of defences includes genes 
for two catalases (P700755_00288, _02059), seven peroxi- 
dases (P700755_00120, _0196, _01102, _01308, _03056, 
_03338, _03478), and three superoxide dismutases 
(P700755_00728, _00729, 01787). Some of these have 



homologs in ACAM 44^ that likely also experiences photoox- 
idative stress in its lake environment. However, several are 
located on GIs in ATCC 700755^ including genes coding a 
diheme cytochrome peroxidase (Gl no. 41-P700755_03478), 
nickel- and iron-based superoxide dismutases (Gl no. 
8-P700755_00728, _00729, Gl no. 19-_01787), and a puta- 
tively secreted catalase (P700755_00288). 

Occurrence of the Proteorhodopsin Gene of P. torquis 
ATCC 700755"^ at the Edge of a Gl That Contains 
Putative Ice-Binding Proteins 

Some P. torquis genes are suspected of having a role in sea-ice 
inhabitation, but the functions remain tentative and the coded 
proteins were generally only weakly abundant in the proteo- 
mic survey. The suppositions are based on the likelihood they 
could be advantageous in a sea-ice ecosystem setting with 
translation requiring specific conditions. One such trait is the 
ability to bind and/or interact with ice crystalline surfaces, 
aiding recruitment and persistence within sea ice (Raymond 
et al. 2007). Genes that could have this role in ATCC 700755"^ 
include a cluster of secreted proteins that may act as adhesins 
and that have a C-terminal domain homologous to other 
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Fig. 6. — ^The Gl region 43 of P. torquis ATCC 700755^ containing a highly translated cluster of transporter proteins including eight ADOP family 
perm ease- 1 ike proteins. Relative protein abundance of gene products for this region is shown in the lower graph. Gray genes denote pseudogenes. Black 
genes denote intact transposases. Purple genes comprise an addiction module. Green genes denote the ADOP family permease-coding genes. Dark blue and 
indigo genes show different families of other transporters including those with ATP-binding regions. The orange gene codes a putative multifunctional acyl- 
CoA thioesterase. Other genes code hypothetical proteins. Further details are shown in supplementary tables SI and S3, Supplementary Material online. 
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ice binding proteins, including those of the sea-ice diatom 
F. cylindrus (Bayer-Giraldi et al. 2010). Interestingly, the puta- 
tive ice-binding/adhesin proteins are located in a Gl (no. 2), 
which has at its edge a proteorhodopsin gene and its cognate 
carotenoid monooxygenase and, immediately upstream, 
ABC-type transporters for complexed iron or cobalamin. The 
genetic arrangement of proteorhodopsin is largely conserved 
in ACAM 44^; however, the putative ice-binding protein clus- 
ter is mostly absent (fig. 8). The adjacent ice-binding proteins 
can be readily detected but are far-less abundant, perhaps 
pointing to a more generalized and important role for pro- 
teorhodopsin. The putative ice-active proteins found in P. tor- 
quis are related to other proteins found in bacteria, algae, and 
yeast (fig. 8), and several with confirmed ice-binding and anti- 
freeze functions; nevertheless, substantial work is required to 
substantiate their functionality. Located on Gl no. 17, adjacent 
to yet another putative ice-binding protein, lies a series of two- 
component sensor systems clustered over a 16-kb region. 



which are weakly orthologous to bacteriophytochromes and 
contain multiple Per-Arnt-Sim (PAS) and cGMP-specific phos- 
phodiesterases, adenylyl cyclases, and FhIA (GAP) domains 
that have been suggested to be light sensors in the genome 
of the proteorhodopsin-possessing strain MED152 (Gonzalez 
et al. 2008). These proteins are only weakly to moderately 
abundant and a photobiological function remains uncon- 
firmed. Psychroflexus torquis, however, responded signifi- 
cantly to light, increasing its growth yield by two to three 
times depending on the salinity and light level (Feng et al. 
201 3), which suggests the presence of sensing and regulatory 
systems that must involve an ability to respond to changing 
light and salinity conditions. 

Aspects of the P. torquis ATCC 700755 Genome Linked 
to Nutrient Acquisition and Algal Colonization in Sea Ice 

A critical aspect for sea-ice adaptation is nutrient acquisition. 
The nature of this ability can be partly surmised by the already 
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hypothetical protein PI23P_08450 Polahbacter irgensn 23P 

hypothetical protein PI23P_03012 Polaribacter irgensH 23P 
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Fig. 8. — The proteorhodopsin and putative ice-binding protein gene cluster associated with the Gl region 2 of P. torquis ATCC 700755^ genome 
compared with equivalent region from P. gondwanensis ACAM 44^. Genes shown in gray are pseudogenes (see supplementary tables S1-S3, 
Supplementary Material online, for more details). Known and putative ice binding proteins within Gl no. 2 are compared with other equivalent proteins 
from bacteria, diatoms {Fragilanopsis spp.), and yeast {Glaciozyma antarctica) in a protein sequence-based tree, where distances were calculated with the 
Grishin algorithm and clustering was via Neighbor-Joining, calculated using the constraint-based multiple alignment tool (www.ncbi.nlm.nih.gov, last 
accessed January 8, 2014) with default parameters. 
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known phenotype of strains of P. torquis. The species uses an 
eclectic range of carbon and energy sources including limited 
ranges of carbohydrates, amino acids, organic acids, and odd 
chain length lipid oxidation products (Bowman et al 1998). In 
sea ice, levels of dissolved organic compounds are at much 
higher levels than in the water column; they include large 
amounts of exopolymeric material, carbohydrates, free 
amino acids, and lipids (Norman et al 201 1). ATCC 700755"^ 
was thus expected to have a complement of genes specifically 
geared to access abundant substrates formed during algal 
primary production, which it may share with ACAM 44^. 
Both strains are relatively fastidious and require growth fac- 
tors, including vitamin B6 and cobalamin, consistent with their 
gene content for cofactor metabolism. Several nutrient acqui- 
sition-related proteins coded by Gl-associated genes were 
found to be comparatively abundant based on proteomic 
analysis (supplementary table SI, Supplementary Material 
online). These included: a number of secreted and nonse- 
creted peptidases (Gl no. 4-glutamate carboxpeptidase II, 
L-carnosine dipeptidase; Gl no. 13 two subtilisin-like prote- 
ases); enzymes for catabolism of Di-hydroxyproline (Gl no. 
34, HCP deaminase, 4-hydroxyproline epimerase); a putative 
SGLTI-like family glucose/Na"^ ion symporter; and several 
TonB-dependent outer membrane receptor and binding pro- 
teins, mainly of the RagA/SusCD families (Gl no. 5, 13, 16), 
which typically take up carbohydrates, polypeptides, and/or 
chelated metallic cations. Because sea ice is a subzero temper- 
ature environment, these proteins are almost certainly cold- 
active (Huston et al. 2000). The genome of C psych rerythraea 
34H was noted to possess a large number of extracellular 
enzyme coding genes (Methe et al. 2005), and dwelling at 
extreme cold seems to require the production of large 
amounts of extracellular enzymes to overcome the mass trans- 
fer limitations caused by low-temperature impairment of 
enzyme function and transport (Struvay and Feller 2012). 

Another aspect of sea-ice inhabitation is the link P. torquis 
has to algae as an epiphyte. Given that P. torquis has substan- 
tial growth factor requirements and is slow growing (fastest 
doubling time ~1 day), persisting in a dynamic system may 
require close interaction with sources of nutrients. Details of 
bacterial/algal associations derivable from the genome data 
are so far effectively limited to inferences; however, they are 
compelling and provide several possible lines of research on 
competitive and mutualistic interactions in sea ice. Both ATCC 
700755"^ and ACAM 44"^ possess a conserved set of surface- 
gliding motility-associated genes (McBride and Zhu 2013) and 
the associated For secretion system (Sato et al 2010); how- 
ever, they strongly differ in terms of cell-surface proteins and 
the presence of several adhesin-like proteins (fig. 2). ATCC 
700755^ is able to perform a form of slow gliding motility 
(Bowman et al. 1998) that has not been demonstrated in 
ACAM 44"^. 

Putative adhesins present in ATCC 700755^ include several 
types such as the aforementioned putative ice-binding 



adhesin-like proteins, VCBS, and fasciclin repeat domain-con- 
taining proteins and autotransporter adhesins. ATCC 700755^ 
also possesses a large surface protein of 468 kDa 
(P700755_00663) homologous to colossin A protein from 
the slime mold Dictyostelium discoideum (Whitney et al. 
2010) that could also be involved with surface interactions. 
The diverse range of these proteins along with ice-active pro- 
teins seems to suggest a flexible attachment ability potentially 
necessary in the highly dynamic sea-ice ecosystem. 

The possibility emerges that the algal interactions of ATCC 
700755 run deeper than simple commensalism. Both ATCC 
700755^ and ACAM 44^ strains possess several signaling 
proteins that putatively respond to the presence of plant hor- 
mones, including GH3 auxin promoter proteins that may 
allow them to coordinate growth and activity with that of 
algal hosts (Lambrecht et al. 2000). Unusually, ATCC 
700755^ secretes 2-phenylethylamine (PEA) in substantial 
levels (Hamana and Niitsu 2001). The physiological function 
of PEA in relation to algae is unclear, but PEA has been found 
to induce production of oxidative bursts in tobacco that has 
relatively high levels of PEA in its leaves and could be linked to 
triggering defence systems (Kawano et al. 2000). An intrigu- 
ing 35-kb cluster of proteins (P700755_01 235 to _01 257) on 
Gl no. 15, neighboring a glycogen synthesis cluster 
(P700755_01229 to _01232) could also be involved in algal 
interactions (fig. 9). The cluster includes signaling proteins 
with cyclases/histidine kinase associated sensory extracellular 
(CHASE) domains as well as several large WD40 repeat do- 
mains containing four metacaspase family proteins. 
Metacaspases have been linked to programmed cell death 
functions in lower eukaryotes including phytoplankton 
(Madeo et al. 2012; Choi and Berges 2013). At this stage, 
few details on the functionality of bacterial metacaspases 
are available (Vercammen et al. 2007). Cooperative and 
noncooperative interactive mechanisms between algae and 
bacteria, whose populations are functionally tightly coupled, 
are a critical aspect of sea-ice ecology, yet remain largely 
unknown. 

Conclusions 

Staley and Gosink (1 999) indicated that a number of bacterial 
genera exist in both Arctic and Antarctic sea ice, but whether 
this finding can be extended to the species level was unknown 
(Staley and Gosink 1999). They argued that the global distri- 
bution of psychrophiles was essentially blocked by a tropical 
marine barrier, providing the possibility that ecosystem-linked 
endemism could emerge. Recently, evidence based on high- 
throughput sequencing of marine microbial communities has 
been presented that, like macroscopic organisms, marine bac- 
teria seem to be subject to biogeographic limitations affecting 
their current and presumably long-term distribution. Not only 
do these proposed limitations suggest the potential for local- 
ized speciation in certain ecosystems (Whitaker 2006), but it 
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^240 1241 1242 ^ 



TPR repeat/CHAT domain protein TPRrepeat/CHAT domain protein CHATdomain m eta ca5pase(C14 family) 

protein 

1244 1245 1246* 1247 1248 U49 125 0 1251* 

CHASE2 domain ^ 
Drotein 

1252* 1253^ 1254^ *^ 1255* 1256* 1257* ^ ^ 
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Fig. 9.— The Gl region 1 5 of P. torquis ATCC 700755^ that contains a cluster of putative metacaspase coding genes (orange) following a glycogen- 
synthesis/utilization cluster (blue) with a possible role linked to programmed cell death. The domain structure of metacaspase genes and associated genes are 
indicated. Genes flagged with asterisks are those with signal peptide regions and thus putatively secreted. A putative LytTR family regulatory protein is coded 
by gene P700755_001 257, while a CHASE2 family protein (usually associated with extracellular sensory proteins) is shown in green. The region is flanked by 
tranposases (black genes) and two tRNA-Val genes. 



also suggests that some connnnunities of bacteria are vulnera- 
ble to extinction brought about by external disruption such as 
clinnate change, habitat destruction, or invasion by other or- 
ganisms (Sul et al. 2013). Here we demonstrate the compar- 
ative genomic features of the bipolar bacterial species 
P. torquis that could be an excellent example of evolving en- 
demism in a bacterial species. A next step in genome-level 
analysis would be to compare Arctic and Antarctic strains to 
determine genetic similarity and degree of change, especially 
in the number and content of GIs, relative state of gene decay 
and overall occurrence of HGT gene regions, which make up 
-35% of the genome of strain ATCC 700755"^. This major 
finding of our study is consistent with the suggestion, based 
on abundance of phage and extracellular DNA in sea-ice 
brines, that sea ice is a hotspot for HGT (Collins and Deming 
2013). High levels of HGT in other sea ice-associated bacteria 
can also be expected. Sea ice or polar seawater-associated 
Octadecabacter species, which possess the light-driven 
proton pump xanthorhodopsin, also show evidence of high 
levels of HGT and have genomes rich in IS elements, pseudo- 
genes, and plasmids (Vollmers et al. 2013). Sea-ice specialism 
apparent in P. torquis appears to be linked to its Gl-associated 
genes, including those for EPS and PUFA synthesis, modes of 
nutrient acquisition and potential ice-binding and algal inter- 
actions. Proteomic data provided evidence that many of the 
associated genes are being actively translated and are thus 
important to the biology of P. torquis. Overall, this study sug- 
gests P. torquis could be an excellent model to study sea-ice 
functional biology and evolutionary processes linked with psy- 
chrophily, endemism, and algal interactions. 
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Supplementary figure SI and tables SI -S3 are available at 
Genome Biology and Evolution online (http://www.gbe. 
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